Exploring episodes data for Game of Thrones
Game of Thrones
Game of Thrones is a fantasy drama television series based on the "A Song of Ice and Fire" novels by George R. R. Martin. It premiered on HBO on April 17, 2011, and concluded on May 19, 2019, spanning eight seasons. The show is set in the fictional continents of Westeros and Essos and interweaves multiple plotlines involving noble families vying for control of the Iron Throne and the Seven Kingdoms of Westeros. With its intricate political intrigue, complex characters, and epic battles, Game of Thrones became a cultural phenomenon, garnering widespread critical acclaim and a dedicated fanbase.
The series received numerous awards and nominations, including 59 Primetime Emmy Awards, making it one of the most awarded series in Emmy history. It also achieved commercial success, with its viewership steadily growing throughout its run. The final season, however, received mixed reviews from both critics and fans.
Game of Thrones is known for its shocking plot twists, morally ambiguous characters, and high production values, including lavish costumes, stunning visual effects, and expansive filming locations. It has left a significant impact on popular culture and has inspired countless discussions, theories, and spin-off projects.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Path to your image file
image_path = "C:/Users/MAGA/Desktop/Game of Thrones/GOT.JPG"
# Load the image
img = mpimg.imread(image_path)
# Display the image
plt.imshow(img)
plt.axis('off') # Hide axes
plt.show()
To begin this exploratory analysis, first import libraries and define functions for plotting the data using matplotlib. Depending on the data, not all plots will be made.
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import numpy as np # linear algebra
import os # accessing directory structure
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # for making plots with seaborn
color = sns.color_palette()
sns.set(rc={'figure.figsize':(25,15)})
There is 1 csv file in the current version of the dataset.
print(os.listdir('C:/Users/MAGA/Desktop/Game of Thrones'))
df_ep = pd.read_csv('C:/Users/MAGA/Desktop/Game of Thrones/got_csv_full.csv')
df_ep.head(10)
print('Number of episodes in the dataset : ' , len(df_ep))
['.ipynb_checkpoints', 'Analysis Game of Thrones Episode Data.ipynb', 'GOT.JPG', 'got_csv_full.csv'] Number of episodes in the dataset : 73
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
df_ep = pd.read_csv('C:/Users/MAGA/Desktop/Game of Thrones/got_csv_full.csv')
# Print the column names to verify
print(df_ep.columns)
# Print the first 10 rows of the dataset
print(df_ep[['Imdb_Rating', 'US_viewers_million']].head(10))
# Print the number of episodes in the dataset
print('Number of episodes in the dataset:', len(df_ep))
# Summary statistics for Rating and Viewership
summary_stats = df_ep[['Imdb_Rating', 'US_viewers_million']].describe()
print(summary_stats)
# Pairplot for Rating and Viewership
sns.pairplot(df_ep[['Imdb_Rating', 'US_viewers_million']])
plt.title('Pairplot of Rating and Viewership')
plt.show()
# Correlation between Rating and Viewership
correlation = df_ep[['Imdb_Rating', 'US_viewers_million']].corr()
print(correlation)
# Heatmap of correlations
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Rating and Viewership')
plt.show()
Index(['Season', 'Episode_Number', 'Number_in_Season', 'Episode_Name',
'Director', 'Writer', 'Original_Air_Date', 'US_viewers_million',
'Runtime_mins', 'IMDB_Description', 'IMDB_votes', 'Imdb_Rating',
'Notable_Death_Count'],
dtype='object')
Imdb_Rating US_viewers_million
0 9.0 2.22
1 8.8 2.20
2 8.7 2.44
3 8.8 2.45
4 9.1 2.58
5 9.2 2.44
6 9.3 2.40
7 9.1 2.72
8 9.6 2.66
9 9.5 3.04
Number of episodes in the dataset: 73
Imdb_Rating US_viewers_million
count 73.000000 73.000000
mean 8.902740 6.447808
std 0.898142 2.827372
min 4.300000 2.200000
25% 8.700000 3.900000
50% 9.000000 6.640000
75% 9.400000 7.820000
max 9.900000 13.610000
Imdb_Rating US_viewers_million Imdb_Rating 1.000000 -0.448423 US_viewers_million -0.448423 1.000000
df_ep_clean = pd.read_csv('C:/Users/MAGA/Desktop/Game of Thrones/got_csv_full.csv')
df_ep_clean.head(10)
print('Number of episodes in the dataset : ' , len(df_ep))
Number of episodes in the dataset : 73
#plotPerColumnDistribution(df_ep, 10, 5)
print(df_ep.dtypes)
Season int64 Episode_Number int64 Number_in_Season int64 Episode_Name object Director object Writer object Original_Air_Date object US_viewers_million float64 Runtime_mins int64 IMDB_Description object IMDB_votes int64 Imdb_Rating float64 Notable_Death_Count int64 dtype: object
%matplotlib inline
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
plt.style.use('ggplot')
import seaborn as sns
color = sns.color_palette()
sns.set(rc={'figure.figsize':(25,15)})
import plotly
plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
# Install cufflinks if not already installed
try:
import cufflinks as cf
except ModuleNotFoundError:
!pip install cufflinks
import cufflinks as cf
import warnings
warnings.filterwarnings('ignore')
The next hidden code cells define functions for plotting data. Click on the "Code" button in the published kernel to reveal the hidden code.
import math
def plotPerColumnDistribution(df, nGraphShown, nGraphPerRow):
nunique = df.nunique()
df = df[[col for col in df if nunique[col] > 1 and nunique[col] < 50]] # For displaying purposes, pick columns that have between 1 and 50 unique values
nRow, nCol = df.shape
columnNames = list(df)
nGraphRow = math.ceil((nCol + nGraphPerRow - 1) / nGraphPerRow) # Ensure nGraphRow is an integer
plt.figure(num=None, figsize=(6 * nGraphPerRow, 8 * nGraphRow), dpi=80, facecolor='w', edgecolor='k')
for i in range(min(nCol, nGraphShown)):
plt.subplot(nGraphRow, nGraphPerRow, i + 1)
columnDf = df.iloc[:, i]
if (not np.issubdtype(type(columnDf.iloc[0]), np.number)):
valueCounts = columnDf.value_counts()
valueCounts.plot.bar()
else:
columnDf.hist()
plt.ylabel('counts')
plt.xticks(rotation=90)
plt.title(f'{columnNames[i]} (column {i})')
plt.tight_layout(pad=1.0, w_pad=1.0, h_pad=1.0)
plt.show()
# Call the function
plotPerColumnDistribution(df_ep, 10, 5)
df_ep.head(5)
| Season | Episode_Number | Number_in_Season | Episode_Name | Director | Writer | Original_Air_Date | US_viewers_million | Runtime_mins | IMDB_Description | IMDB_votes | Imdb_Rating | Notable_Death_Count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | Winter Is Coming | Tim Van Patten | David Benioff & D. B. Weiss | April 17, 2011 | 2.22 | 62 | Jon Arryn, the Hand of the King, is dead. King... | 27685 | 9.0 | 4 |
| 1 | 1 | 2 | 2 | The Kingsroad | Tim Van Patten | David Benioff & D. B. Weiss | April 24, 2011 | 2.20 | 56 | While Bran recovers from his fall, Ned takes o... | 21256 | 8.8 | 3 |
| 2 | 1 | 3 | 3 | Lord Snow | Brian Kirk | David Benioff & D. B. Weiss | May 1, 2011 | 2.44 | 58 | Lord Stark and his daughters arrive at King's ... | 20090 | 8.7 | 0 |
| 3 | 1 | 4 | 4 | Cripples, Bastards, and Broken Things | Brian Kirk | Bryan Cogman | May 8, 2011 | 2.45 | 56 | Eddard investigates Jon Arryn's murder. Jon be... | 19123 | 8.8 | 1 |
| 4 | 1 | 5 | 5 | The Wolf and the Lion | Brian Kirk | David Benioff & D. B. Weiss | May 15, 2011 | 2.58 | 55 | Catelyn has captured Tyrion and plans to bring... | 20062 | 9.1 | 5 |
Distribution graphs (histogram/bar graph) of sampled columns:
plotPerColumnDistribution(df_ep, 10, 5)
number_of_deaths_in_category = df_ep['Notable_Death_Count'].value_counts().sort_values(ascending=True)
data = [go.Pie(
labels = number_of_deaths_in_category.index,
values = number_of_deaths_in_category.values,
hoverinfo = 'label+value'
)]
plotly.offline.iplot(data, filename='Notable_Death_Count')
Most episodes have either 2 (19.2%, 14 episodes) or 4 o1 (17.8%, 13 episodes) notable deaths
Do any episodes perform really bad or really good?
data = [go.Histogram(
x = df_ep.Imdb_Rating,
xbins = {'start': 1, 'size':0.5, 'end' :10}
)]
print('Average episode rating = ', np.mean(df_ep['Imdb_Rating']))
plotly.offline.iplot(data, filename='overall_rating_distribution')
Average episode rating = 8.902739726027397
import seaborn as sns
sns.set_style("darkgrid")
ax = sns.jointplot(data=df_ep, x='Season', y='Imdb_Rating')
Season 1 has most consistent ratings, Season 3 and 6 have highest ratings overall
How do notable death count impact the episodes ratings
import seaborn as sns
sns.set_style("darkgrid")
ax = sns.jointplot(x=df_ep['Notable_Death_Count'], y=df_ep['Imdb_Rating'])
Most top rated episodes have optimally sized deaths 0 to 4 - neither too less nor too much.
import seaborn as sns
sns.set_style("darkgrid")
ax = sns.jointplot(data=df_ep, x='US_viewers_million', y='Imdb_Rating')
Consistent ratings until ~8 million viewers
fig, ax = plt.subplots()
fig.set_size_inches(10, 5)
p = sns.stripplot(x="Imdb_Rating", y="US_viewers_million", data=df_ep, jitter=True, linewidth=1)
title = ax.set_title('Viewers vs. Ratings')
fig, ax = plt.subplots()
fig.set_size_inches(10, 5)
p = sns.stripplot(x="Imdb_Rating", y="Writer", data=df_ep_clean, jitter=True, linewidth=1)
title = ax.set_title('Writers vs. Ratings')
#df_ep_clean_1 = df_ep_clean
#df_ep_clean_1['Writer'] = df_ep_clean['Writer'].apply(lambda x: x.replace(' ', ' ') if ',' in str(x) else x)
fig, ax = plt.subplots()
fig.set_size_inches(10, 5)
p = sns.stripplot(x="Imdb_Rating", y="Director", data=df_ep_clean, jitter=True, linewidth=1)
title = ax.set_title('Directors vs. Ratings')
#!pip install bubbly
from __future__ import division
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode()
# Install bubbly if not already installed
try:
from bubbly.bubbly import bubbleplot
except ModuleNotFoundError:
!pip install bubbly
from bubbly.bubbly import bubbleplot
figure = bubbleplot(dataset=df_ep_clean, x_column='US_viewers_million', y_column='Imdb_Rating'
, bubble_column='Season'
#, time_column='Original_Air_Date'
, size_column='IMDB_votes'
, color_column='Writer'
,x_title="Viewers (millions)", y_title="IMDB Ratings", title='Viewers and Ratings by Writers and IMDB Voters',
x_logscale=True, scale_bubble=3, height=650)
iplot(figure, config={'scrollzoom': True})
Conclusion
Season-wise Ratings: While Season 1 exhibits consistent ratings across episodes, Seasons 3 and 6 emerge as the seasons with the highest overall ratings, indicating a potential increase in storytelling quality or audience engagement over time. Impact of Notable Deaths: Most episodes feature a moderate number of notable deaths, ranging from 0 to 4, which optimizes viewer satisfaction based on ratings. Episodes with fewer or more notable deaths tend to have lower ratings, suggesting a delicate balance in maintaining audience interest without overwhelming them with deaths. Viewer Figures vs. Ratings: The analysis reveals a consistent trend in ratings across episodes until approximately 8 million viewers. Despite fluctuations in viewership, particularly in later seasons, there isn't a significant correlation between higher viewership and higher ratings. For instance, while the viewership might have increased by a certain percentage between seasons 3 and 5, the corresponding increase in ratings is not proportionate. Influence of Writers and Directors: Certain writers and directors consistently produce episodes with higher ratings, indicating the significant impact of individual creative input on episode quality and audience reception. This observation underscores the importance of talent and skill in shaping the success of each episode. Viewers and Ratings by Writers and IMDb Voters: The bubble plot analysis provides a comprehensive view of the relationship between viewership, ratings, and IMDb votes for each episode, segmented by season and writer. It highlights the varying levels of viewer engagement and satisfaction across different episodes and seasons, with writers playing a crucial role in shaping these metrics. Overall, while the series maintained a consistent level of quality and audience engagement throughout its run, certain seasons and episodes stood out for their exceptional storytelling and execution. These observations underscore the multifaceted dynamics of audience reception and creative influence within the Game of Thrones universe.